library(readr)
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
  method           from
  print.rowwise_df     
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.0     ✔ purrr   0.3.2
✔ tibble  2.1.3     ✔ dplyr   0.8.2
✔ tidyr   0.8.3     ✔ stringr 1.4.0
✔ ggplot2 3.2.0     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
diamond <- read_csv("diamonds.csv")
Missing column names filled in: 'X1' [1]Parsed with column specification:
cols(
  X1 = col_double(),
  carat = col_double(),
  cut = col_character(),
  color = col_character(),
  clarity = col_character(),
  depth = col_double(),
  table = col_double(),
  price = col_double(),
  x = col_double(),
  y = col_double(),
  z = col_double()
)
diamond_trim <- subset(diamond, select = c("carat", "x", "y", "z"))

Load the diamonds.csv data set and undertake an initial exploration of the data. You will find a description of the meanings of the variables on the relevant Kaggle page

We expect the carat of the diamonds to be strong correlated with the physical dimensions x, y and z. Use ggpairs() to investigate correlations between these four variables.

library(GGally)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

Attaching package: ‘GGally’

The following object is masked from ‘package:dplyr’:

    nasa
ggpairs(diamond_trim)

So, we do find significant correlations. Let’s drop columns x, y and z from the dataset, in preparation to use only carat going forward.

diamond_carat <- subset(diamond)[, 1:8]

We are interested in developing a regression model for the price of a diamond in terms of the possible predictor variables in the dataset. Use ggpairs() to investigate correlations between price and the predictors (this may take a while to run, don’t worry, make coffee or something).

ggpairs(diamond_carat)

Perform further ggplot visualisations of any significant correlations you find.

diamond_carat %>%
  ggplot(aes(x = carat, y = price)) +
  geom_point()

diamond_carat %>%
  ggplot(aes(x = cut, y = price)) +
  geom_boxplot()

diamond_carat %>%
  ggplot(aes(x = color, y = price)) +
  geom_boxplot()

diamond_carat %>%
  ggplot(aes(x = clarity, y = price)) +
  geom_boxplot()

Shortly we may try a regression fit using one or more of the categorical predictors cut, clarity and color, so let’s investigate these predictors: Investigate the factor levels of these predictors. How many dummy variables do you expect for each of them?

length(unique(diamond_carat$cut))
[1] 5
length(unique(diamond_carat$color))
[1] 7
length(unique(diamond_carat$clarity))
[1] 8

Use the dummy_cols() function in the fastDummies package to generate dummies for these predictors and check the number of dummies in each case.

library(fastDummies)
diamond_carat <- dummy_cols(diamond_carat, 
                      select_columns = c("cut", "color", "clarity"), 
                      remove_first_dummy = TRUE)
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     

Going forward we’ll let R handle dummy variable creation for categorical predictors in regression fitting (remember lm() will generate the correct numbers of dummy levels automatically, absorbing one of the levels into the intercept as a reference level) First, we’ll start with simple linear regression. Regress price on carat and check the regression diagnostics.

price_carat_model <- lm(price ~ carat, 
                        data = diamond_carat)
summary(price_carat_model)

Call:
lm(formula = price ~ carat, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-18585.3   -804.8    -18.9    537.4  12731.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2256.36      13.06  -172.8   <2e-16 ***
carat        7756.43      14.07   551.4   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1549 on 53938 degrees of freedom
Multiple R-squared:  0.8493,    Adjusted R-squared:  0.8493 
F-statistic: 3.041e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(price_carat_model)

Run a regression with one or both of the predictor and response variables log() transformed and recheck the diagnostics. Do you see any improvement?

diamond_carat <- diamond_carat %>%
  mutate(price_log = log(price)) %>%
  mutate(carat_log =log(carat))
log_log_price_carat <- lm(price_log ~ carat_log, 
                          data = diamond_carat)
summary(log_log_price_carat)

Call:
lm(formula = price_log ~ carat_log, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.50833 -0.16951 -0.00591  0.16637  1.33793 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 8.448661   0.001365  6190.9   <2e-16 ***
carat_log   1.675817   0.001934   866.6   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2627 on 53938 degrees of freedom
Multiple R-squared:  0.933, Adjusted R-squared:  0.933 
F-statistic: 7.51e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(log_log_price_carat)

log_price <- lm(price_log ~ carat, 
                data = diamond_carat)
summary(log_price)

Call:
lm(formula = price_log ~ carat, data = diamond_carat)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.2844 -0.2449  0.0335  0.2578  1.5642 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.215021   0.003348    1856   <2e-16 ***
carat       1.969757   0.003608     546   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3972 on 53938 degrees of freedom
Multiple R-squared:  0.8468,    Adjusted R-squared:  0.8468 
F-statistic: 2.981e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(log_price)

log_carat <- lm(price ~ carat_log, 
                data = diamond_carat)
summary(log_carat)

Call:
lm(formula = price ~ carat_log, data = diamond_carat)

Residuals:
    Min      1Q  Median      3Q     Max 
-6137.4 -1495.1  -328.3  1141.9 12075.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6237.84      10.73   581.2   <2e-16 ***
carat_log    5836.02      15.21   383.8   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2066 on 53938 degrees of freedom
Multiple R-squared:  0.7319,    Adjusted R-squared:  0.7319 
F-statistic: 1.473e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(log_carat)

Let’s use log() transformations of both predictor and response. Next, experiment with adding a single categorical predictor into the model. Which categorical predictor is best? [Hint - investigate r2 values]

log_log_cut <- lm(price_log ~ carat_log + cut, 
                  data = diamond_carat)
log_log_color <- lm(price_log ~ carat_log + color, 
                  data = diamond_carat)
log_log_clarity <- lm(price_log ~ carat_log + clarity, 
                  data = diamond_carat)
summary(log_log_cut)

Call:
lm(formula = price_log ~ carat_log + cut, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.52247 -0.16484 -0.00587  0.16087  1.38115 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.200125   0.006343 1292.69   <2e-16 ***
carat_log    1.695771   0.001910  887.68   <2e-16 ***
cutGood      0.163245   0.007324   22.29   <2e-16 ***
cutIdeal     0.317212   0.006632   47.83   <2e-16 ***
cutPremium   0.238217   0.006716   35.47   <2e-16 ***
cutVery Good 0.240753   0.006779   35.52   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2545 on 53934 degrees of freedom
Multiple R-squared:  0.9371,    Adjusted R-squared:  0.9371 
F-statistic: 1.607e+05 on 5 and 53934 DF,  p-value: < 2.2e-16
summary(log_log_color)

Call:
lm(formula = price_log ~ carat_log + color, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.49987 -0.14888  0.00257  0.15316  1.27815 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  8.572034   0.003051 2809.531  < 2e-16 ***
carat_log    1.728631   0.001814  952.727  < 2e-16 ***
colorE      -0.025460   0.003748   -6.793 1.11e-11 ***
colorF      -0.034455   0.003773   -9.132  < 2e-16 ***
colorG      -0.055399   0.003653  -15.166  < 2e-16 ***
colorH      -0.189859   0.003917  -48.468  < 2e-16 ***
colorI      -0.286928   0.004383  -65.467  < 2e-16 ***
colorJ      -0.425038   0.005417  -78.466  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2372 on 53932 degrees of freedom
Multiple R-squared:  0.9454,    Adjusted R-squared:  0.9454 
F-statistic: 1.333e+05 on 7 and 53932 DF,  p-value: < 2.2e-16
summary(log_log_clarity)

Call:
lm(formula = price_log ~ carat_log + clarity, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.97521 -0.12085  0.01048  0.12561  1.85854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 7.768115   0.006940 1119.25   <2e-16 ***
carat_log   1.806424   0.001514 1193.23   <2e-16 ***
clarityIF   1.114625   0.008376  133.07   <2e-16 ***
claritySI1  0.624558   0.007163   87.19   <2e-16 ***
claritySI2  0.479658   0.007217   66.46   <2e-16 ***
clarityVS1  0.820461   0.007306  112.30   <2e-16 ***
clarityVS2  0.775248   0.007197  107.72   <2e-16 ***
clarityVVS1 1.028298   0.007745  132.77   <2e-16 ***
clarityVVS2 0.979221   0.007529  130.05   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1888 on 53931 degrees of freedom
Multiple R-squared:  0.9654,    Adjusted R-squared:  0.9654 
F-statistic: 1.879e+05 on 8 and 53931 DF,  p-value: < 2.2e-16

Interpret the fitted coefficients for the levels of your chosen categorical predictor. Which level is the reference level? Which level shows the greatest difference in price from the reference level? [Hints - remember we are regressing the log(price) here, and think about what the presence of the log(carat) predictor implies. We’re not expecting a mathematical explanation]

unique(diamond_carat$clarity)
[1] "SI2"  "SI1"  "VS1"  "VS2"  "VVS2" "VVS1" "I1"   "IF"  

I1 is the reference level log_price of diamind is 1.114625 for clarity SI1 than fro reference clarity I1 for a fixed log(carat)

log of something < 1 is negative -> price is lower than reference log of something > 1 is positive -> increase in price to reference

2 Extension Try adding an interaction between log(carat) and your chosen categorical predictor. Do you think this interaction term is statistically justified?

log_log_caratclarity <- lm(price_log ~ carat_log + clarity + carat_log:clarity, 
                  data = diamond_carat)
summary(log_log_caratclarity)

Call:
lm(formula = price_log ~ carat_log + clarity + carat_log:clarity, 
    data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.92773 -0.12104  0.01212  0.12465  1.51830 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           7.808102   0.007223 1080.98   <2e-16 ***
carat_log             1.528106   0.014944  102.25   <2e-16 ***
clarityIF             1.122732   0.011381   98.65   <2e-16 ***
claritySI1            0.587556   0.007465   78.71   <2e-16 ***
claritySI2            0.438797   0.007486   58.62   <2e-16 ***
clarityVS1            0.790989   0.007721  102.45   <2e-16 ***
clarityVS2            0.723109   0.007530   96.03   <2e-16 ***
clarityVVS1           1.007591   0.009506  106.00   <2e-16 ***
clarityVVS2           0.968426   0.008359  115.85   <2e-16 ***
carat_log:clarityIF   0.337116   0.017593   19.16   <2e-16 ***
carat_log:claritySI1  0.288214   0.015254   18.89   <2e-16 ***
carat_log:claritySI2  0.258795   0.015437   16.76   <2e-16 ***
carat_log:clarityVS1  0.300307   0.015395   19.51   <2e-16 ***
carat_log:clarityVS2  0.250187   0.015237   16.42   <2e-16 ***
carat_log:clarityVVS1 0.301955   0.016317   18.51   <2e-16 ***
carat_log:clarityVVS2 0.321665   0.015717   20.47   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1877 on 53924 degrees of freedom
Multiple R-squared:  0.9658,    Adjusted R-squared:  0.9658 
F-statistic: 1.014e+05 on 15 and 53924 DF,  p-value: < 2.2e-16

Not justified, R_squared still pretty much the same

Find and plot an appropriate visualisation to show the effect of this interaction

library(ggiraphExtra)
ggPredict(log_log_caratclarity)
NAs introduced by coercionError in `$<-.data.frame`(`*tmp*`, "tooltip", value = "for clarity=\n") : 
  replacement has 1 row, data has 0
coplot(price_log ~ carat_log | clarity, 
       panel = function(x, y, ...){
         points(x, y)
         abline(lm(y ~ x), col = "blue")
       }, 
       overlap = 0.2,
       data = diamond_carat)

LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3J9CmxpYnJhcnkocmVhZHIpCmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCmBgYHtyfQpkaWFtb25kIDwtIHJlYWRfY3N2KCJkaWFtb25kcy5jc3YiKQpkaWFtb25kX3RyaW0gPC0gc3Vic2V0KGRpYW1vbmQsIHNlbGVjdCA9IGMoImNhcmF0IiwgIngiLCAieSIsICJ6IikpCmBgYAoKTG9hZCB0aGUgZGlhbW9uZHMuY3N2IGRhdGEgc2V0IGFuZCB1bmRlcnRha2UgYW4gaW5pdGlhbCBleHBsb3JhdGlvbiBvZiB0aGUgZGF0YS4gWW91IHdpbGwgZmluZCBhIGRlc2NyaXB0aW9uIG9mIHRoZSBtZWFuaW5ncyBvZiB0aGUgdmFyaWFibGVzIG9uIHRoZSByZWxldmFudCBLYWdnbGUgcGFnZQoKV2UgZXhwZWN0IHRoZSBjYXJhdCBvZiB0aGUgZGlhbW9uZHMgdG8gYmUgc3Ryb25nIGNvcnJlbGF0ZWQgd2l0aCB0aGUgcGh5c2ljYWwgZGltZW5zaW9ucyB4LCB5IGFuZCB6LiBVc2UgZ2dwYWlycygpIHRvIGludmVzdGlnYXRlIGNvcnJlbGF0aW9ucyBiZXR3ZWVuIHRoZXNlIGZvdXIgdmFyaWFibGVzLgpgYGB7cn0KbGlicmFyeShHR2FsbHkpCmdncGFpcnMoZGlhbW9uZF90cmltKQpgYGAKClNvLCB3ZSBkbyBmaW5kIHNpZ25pZmljYW50IGNvcnJlbGF0aW9ucy4gTGV04oCZcyBkcm9wIGNvbHVtbnMgeCwgeSBhbmQgeiBmcm9tIHRoZSBkYXRhc2V0LCBpbiBwcmVwYXJhdGlvbiB0byB1c2Ugb25seSBjYXJhdCBnb2luZyBmb3J3YXJkLgpgYGB7cn0KZGlhbW9uZF9jYXJhdCA8LSBzdWJzZXQoZGlhbW9uZClbLCAxOjhdCmBgYAoKCldlIGFyZSBpbnRlcmVzdGVkIGluIGRldmVsb3BpbmcgYSByZWdyZXNzaW9uIG1vZGVsIGZvciB0aGUgcHJpY2Ugb2YgYSBkaWFtb25kIGluIHRlcm1zIG9mIHRoZSBwb3NzaWJsZSBwcmVkaWN0b3IgdmFyaWFibGVzIGluIHRoZSBkYXRhc2V0LiBVc2UgZ2dwYWlycygpIHRvIGludmVzdGlnYXRlIGNvcnJlbGF0aW9ucyBiZXR3ZWVuIHByaWNlIGFuZCB0aGUgcHJlZGljdG9ycyAodGhpcyBtYXkgdGFrZSBhIHdoaWxlIHRvIHJ1biwgZG9u4oCZdCB3b3JyeSwgbWFrZSBjb2ZmZWUgb3Igc29tZXRoaW5nKS4KYGBge3J9CmdncGFpcnMoZGlhbW9uZF9jYXJhdCkKYGBgCgoKUGVyZm9ybSBmdXJ0aGVyIGdncGxvdCB2aXN1YWxpc2F0aW9ucyBvZiBhbnkgc2lnbmlmaWNhbnQgY29ycmVsYXRpb25zIHlvdSBmaW5kLgpgYGB7cn0KZGlhbW9uZF9jYXJhdCAlPiUKICBnZ3Bsb3QoYWVzKHggPSBjYXJhdCwgeSA9IHByaWNlKSkgKwogIGdlb21fcG9pbnQoKQpgYGAKCmBgYHtyfQpkaWFtb25kX2NhcmF0ICU+JQogIGdncGxvdChhZXMoeCA9IGN1dCwgeSA9IHByaWNlKSkgKwogIGdlb21fYm94cGxvdCgpCmBgYAoKYGBge3J9CmRpYW1vbmRfY2FyYXQgJT4lCiAgZ2dwbG90KGFlcyh4ID0gY29sb3IsIHkgPSBwcmljZSkpICsKICBnZW9tX2JveHBsb3QoKQpgYGAKCmBgYHtyfQpkaWFtb25kX2NhcmF0ICU+JQogIGdncGxvdChhZXMoeCA9IGNsYXJpdHksIHkgPSBwcmljZSkpICsKICBnZW9tX2JveHBsb3QoKQpgYGAKCgpTaG9ydGx5IHdlIG1heSB0cnkgYSByZWdyZXNzaW9uIGZpdCB1c2luZyBvbmUgb3IgbW9yZSBvZiB0aGUgY2F0ZWdvcmljYWwgcHJlZGljdG9ycyBjdXQsIGNsYXJpdHkgYW5kIGNvbG9yLCBzbyBsZXTigJlzIGludmVzdGlnYXRlIHRoZXNlIHByZWRpY3RvcnM6IEludmVzdGlnYXRlIHRoZSBmYWN0b3IgbGV2ZWxzIG9mIHRoZXNlIHByZWRpY3RvcnMuIEhvdyBtYW55IGR1bW15IHZhcmlhYmxlcyBkbyB5b3UgZXhwZWN0IGZvciBlYWNoIG9mIHRoZW0/CmBgYHtyfQpsZW5ndGgodW5pcXVlKGRpYW1vbmRfY2FyYXQkY3V0KSkKbGVuZ3RoKHVuaXF1ZShkaWFtb25kX2NhcmF0JGNvbG9yKSkKbGVuZ3RoKHVuaXF1ZShkaWFtb25kX2NhcmF0JGNsYXJpdHkpKQpgYGAKClVzZSB0aGUgZHVtbXlfY29scygpIGZ1bmN0aW9uIGluIHRoZSBmYXN0RHVtbWllcyBwYWNrYWdlIHRvIGdlbmVyYXRlIGR1bW1pZXMgZm9yIHRoZXNlIHByZWRpY3RvcnMgYW5kIGNoZWNrIHRoZSBudW1iZXIgb2YgZHVtbWllcyBpbiBlYWNoIGNhc2UuCmBgYHtyfQpsaWJyYXJ5KGZhc3REdW1taWVzKQpkaWFtb25kX2NhcmF0IDwtIGR1bW15X2NvbHMoZGlhbW9uZF9jYXJhdCwgCiAgICAgICAgICAgICAgICAgICAgICBzZWxlY3RfY29sdW1ucyA9IGMoImN1dCIsICJjb2xvciIsICJjbGFyaXR5IiksIAogICAgICAgICAgICAgICAgICAgICAgcmVtb3ZlX2ZpcnN0X2R1bW15ID0gVFJVRSkKYGBgCgpHb2luZyBmb3J3YXJkIHdl4oCZbGwgbGV0IFIgaGFuZGxlIGR1bW15IHZhcmlhYmxlIGNyZWF0aW9uIGZvciBjYXRlZ29yaWNhbCBwcmVkaWN0b3JzIGluIHJlZ3Jlc3Npb24gZml0dGluZyAocmVtZW1iZXIgbG0oKSB3aWxsIGdlbmVyYXRlIHRoZSBjb3JyZWN0IG51bWJlcnMgb2YgZHVtbXkgbGV2ZWxzIGF1dG9tYXRpY2FsbHksIGFic29yYmluZyBvbmUgb2YgdGhlIGxldmVscyBpbnRvIHRoZSBpbnRlcmNlcHQgYXMgYSByZWZlcmVuY2UgbGV2ZWwpCkZpcnN0LCB3ZeKAmWxsIHN0YXJ0IHdpdGggc2ltcGxlIGxpbmVhciByZWdyZXNzaW9uLiBSZWdyZXNzIHByaWNlIG9uIGNhcmF0IGFuZCBjaGVjayB0aGUgcmVncmVzc2lvbiBkaWFnbm9zdGljcy4KYGBge3J9CnByaWNlX2NhcmF0X21vZGVsIDwtIGxtKHByaWNlIH4gY2FyYXQsIAogICAgICAgICAgICAgICAgICAgICAgICBkYXRhID0gZGlhbW9uZF9jYXJhdCkKc3VtbWFyeShwcmljZV9jYXJhdF9tb2RlbCkKcGxvdChwcmljZV9jYXJhdF9tb2RlbCkKYGBgCgpSdW4gYSByZWdyZXNzaW9uIHdpdGggb25lIG9yIGJvdGggb2YgdGhlIHByZWRpY3RvciBhbmQgcmVzcG9uc2UgdmFyaWFibGVzIGxvZygpIHRyYW5zZm9ybWVkIGFuZCByZWNoZWNrIHRoZSBkaWFnbm9zdGljcy4gRG8geW91IHNlZSBhbnkgaW1wcm92ZW1lbnQ/CmBgYHtyfQpkaWFtb25kX2NhcmF0IDwtIGRpYW1vbmRfY2FyYXQgJT4lCiAgbXV0YXRlKHByaWNlX2xvZyA9IGxvZyhwcmljZSkpICU+JQogIG11dGF0ZShjYXJhdF9sb2cgPWxvZyhjYXJhdCkpCmBgYAoKYGBge3J9CmxvZ19sb2dfcHJpY2VfY2FyYXQgPC0gbG0ocHJpY2VfbG9nIH4gY2FyYXRfbG9nLCAKICAgICAgICAgICAgICAgICAgICAgICAgICBkYXRhID0gZGlhbW9uZF9jYXJhdCkKc3VtbWFyeShsb2dfbG9nX3ByaWNlX2NhcmF0KQpwbG90KGxvZ19sb2dfcHJpY2VfY2FyYXQpCmBgYAoKYGBge3J9CmxvZ19wcmljZSA8LSBsbShwcmljZV9sb2cgfiBjYXJhdCwgCiAgICAgICAgICAgICAgICBkYXRhID0gZGlhbW9uZF9jYXJhdCkKc3VtbWFyeShsb2dfcHJpY2UpCnBsb3QobG9nX3ByaWNlKQpgYGAKCmBgYHtyfQpsb2dfY2FyYXQgPC0gbG0ocHJpY2UgfiBjYXJhdF9sb2csIAogICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCnN1bW1hcnkobG9nX2NhcmF0KQpwbG90KGxvZ19jYXJhdCkKYGBgCgpMZXTigJlzIHVzZSBsb2coKSB0cmFuc2Zvcm1hdGlvbnMgb2YgYm90aCBwcmVkaWN0b3IgYW5kIHJlc3BvbnNlLiBOZXh0LCBleHBlcmltZW50IHdpdGggYWRkaW5nIGEgc2luZ2xlIGNhdGVnb3JpY2FsIHByZWRpY3RvciBpbnRvIHRoZSBtb2RlbC4gV2hpY2ggY2F0ZWdvcmljYWwgcHJlZGljdG9yIGlzIGJlc3Q/IFtIaW50IC0gaW52ZXN0aWdhdGUgcjIgdmFsdWVzXQpgYGB7cn0KbG9nX2xvZ19jdXQgPC0gbG0ocHJpY2VfbG9nIH4gY2FyYXRfbG9nICsgY3V0LCAKICAgICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCmxvZ19sb2dfY29sb3IgPC0gbG0ocHJpY2VfbG9nIH4gY2FyYXRfbG9nICsgY29sb3IsIAogICAgICAgICAgICAgICAgICBkYXRhID0gZGlhbW9uZF9jYXJhdCkKbG9nX2xvZ19jbGFyaXR5IDwtIGxtKHByaWNlX2xvZyB+IGNhcmF0X2xvZyArIGNsYXJpdHksIAogICAgICAgICAgICAgICAgICBkYXRhID0gZGlhbW9uZF9jYXJhdCkKc3VtbWFyeShsb2dfbG9nX2N1dCkKc3VtbWFyeShsb2dfbG9nX2NvbG9yKQpzdW1tYXJ5KGxvZ19sb2dfY2xhcml0eSkKYGBgCgpJbnRlcnByZXQgdGhlIGZpdHRlZCBjb2VmZmljaWVudHMgZm9yIHRoZSBsZXZlbHMgb2YgeW91ciBjaG9zZW4gY2F0ZWdvcmljYWwgcHJlZGljdG9yLiBXaGljaCBsZXZlbCBpcyB0aGUgcmVmZXJlbmNlIGxldmVsPyBXaGljaCBsZXZlbCBzaG93cyB0aGUgZ3JlYXRlc3QgZGlmZmVyZW5jZSBpbiBwcmljZSBmcm9tIHRoZSByZWZlcmVuY2UgbGV2ZWw/IFtIaW50cyAtIHJlbWVtYmVyIHdlIGFyZSByZWdyZXNzaW5nIHRoZSBsb2cocHJpY2UpIGhlcmUsIGFuZCB0aGluayBhYm91dCB3aGF0IHRoZSBwcmVzZW5jZSBvZiB0aGUgbG9nKGNhcmF0KSBwcmVkaWN0b3IgaW1wbGllcy4gV2XigJlyZSBub3QgZXhwZWN0aW5nIGEgbWF0aGVtYXRpY2FsIGV4cGxhbmF0aW9uXQpgYGB7cn0KdW5pcXVlKGRpYW1vbmRfY2FyYXQkY2xhcml0eSkKYGBgCkkxIGlzIHRoZSByZWZlcmVuY2UgbGV2ZWwKbG9nX3ByaWNlIG9mIGRpYW1pbmQgaXMgMS4xMTQ2MjUgZm9yIGNsYXJpdHkgU0kxIHRoYW4gZnJvIHJlZmVyZW5jZSBjbGFyaXR5IEkxCmZvciBhIGZpeGVkIGxvZyhjYXJhdCkKCmxvZyBvZiBzb21ldGhpbmcgPCAxIGlzIG5lZ2F0aXZlIC0+IHByaWNlIGlzIGxvd2VyIHRoYW4gcmVmZXJlbmNlCmxvZyBvZiBzb21ldGhpbmcgPiAxIGlzIHBvc2l0aXZlIC0+IGluY3JlYXNlIGluIHByaWNlIHRvIHJlZmVyZW5jZQoKCjIgRXh0ZW5zaW9uClRyeSBhZGRpbmcgYW4gaW50ZXJhY3Rpb24gYmV0d2VlbiBsb2coY2FyYXQpIGFuZCB5b3VyIGNob3NlbiBjYXRlZ29yaWNhbCBwcmVkaWN0b3IuIERvIHlvdSB0aGluayB0aGlzIGludGVyYWN0aW9uIHRlcm0gaXMgc3RhdGlzdGljYWxseSBqdXN0aWZpZWQ/CmBgYHtyfQpsb2dfbG9nX2NhcmF0Y2xhcml0eSA8LSBsbShwcmljZV9sb2cgfiBjYXJhdF9sb2cgKyBjbGFyaXR5ICsgY2FyYXRfbG9nOmNsYXJpdHksIAogICAgICAgICAgICAgICAgICBkYXRhID0gZGlhbW9uZF9jYXJhdCkKc3VtbWFyeShsb2dfbG9nX2NhcmF0Y2xhcml0eSkKYGBgCk5vdCBqdXN0aWZpZWQsIApSX3NxdWFyZWQgc3RpbGwgcHJldHR5IG11Y2ggdGhlIHNhbWUgCgpGaW5kIGFuZCBwbG90IGFuIGFwcHJvcHJpYXRlIHZpc3VhbGlzYXRpb24gdG8gc2hvdyB0aGUgZWZmZWN0IG9mIHRoaXMgaW50ZXJhY3Rpb24KYGBge3J9CmxpYnJhcnkoZ2dpcmFwaEV4dHJhKQpnZ1ByZWRpY3QobG9nX2xvZ19jYXJhdGNsYXJpdHkpCmBgYApgYGB7cn0KY29wbG90KHByaWNlX2xvZyB+IGNhcmF0X2xvZyB8IGNsYXJpdHksIAogICAgICAgcGFuZWwgPSBmdW5jdGlvbih4LCB5LCAuLi4pewogICAgICAgICBwb2ludHMoeCwgeSkKICAgICAgICAgYWJsaW5lKGxtKHkgfiB4KSwgY29sID0gImJsdWUiKQogICAgICAgfSwgCiAgICAgICBvdmVybGFwID0gMC4yLAogICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCmBgYAoKCg==